视觉摄像头是超越视觉线(B-VLOS)无人机操作的吸引人的设备,因为它们的尺寸,重量,功率和成本较低,并且可以为GPS失败提供多余的方式。但是,最新的视觉定位算法无法匹配由于照明或观点而导致外观明显不同的视觉数据。本文介绍了Isimloc,这是一种条件/观点一致的层次结构全局重新定位方法。 Isimloc的位置功能可用于在不断变化的外观和观点下搜索目标图像。此外,我们的分层全局重新定位模块以粗到精细的方式完善,使Isimloc可以执行快速准确的估计。我们在一个数据集上评估了我们的方法,其中具有外观变化和一个数据集,该数据集的重点是在复杂的环境中长期飞行进行大规模匹配。在我们的两个数据集中,Isimloc在1.5s推导时间的成功检索率达到88.7 \%和83.8 \%,而使用下一个最佳方法,为45.8%和39.7%。这些结果证明了在各种环境中的强大定位。
translated by 谷歌翻译
位置识别是可以协助同时定位和映射(SLAM)进行循环闭合检测和重新定位以进行长期导航的基本模块。在过去的20美元中,该地点认可社区取得了惊人的进步,这吸引了在计算机视觉和机器人技术等多个领域的广泛研究兴趣和应用。但是,在复杂的现实世界情景中,很少有方法显示出有希望的位置识别性能,在复杂的现实世界中,长期和大规模的外观变化通常会导致故障。此外,在最先进的方法之间缺乏集成框架,可以应对所有挑战,包括外观变化,观点差异,对未知区域的稳健性以及现实世界中的效率申请。在这项工作中,我们调查针对长期本地化并讨论未来方向和机会的最先进方法。首先,我们研究了长期自主权中的位置识别以及在现实环境中面临的主要挑战。然后,我们回顾了最新的作品,以应对各种位置识别挑战的不同传感器方式和当前的策略的认可。最后,我们回顾了现有的数据集以进行长期本地化,并为不同的方法介绍了我们的数据集和评估API。本文可以成为该地点识别界新手的研究人员以及关心长期机器人自主权的研究人员。我们还对机器人技术中的常见问题提供了意见:机器人是否需要准确的本地化来实现长期自治?这项工作以及我们的数据集和评估API的摘要可向机器人社区公开,网址为:https://github.com/metaslam/gprs。
translated by 谷歌翻译
我们提出Automerge,这是一种LIDAR数据处理框架,用于将大量地图段组装到完整的地图中。传统的大规模地图合并方法对于错误的数据关联是脆弱的,并且主要仅限于离线工作。 Automerge利用多观点的融合和自适应环路闭合检测来进行准确的数据关联,并且它使用增量合并来从随机顺序给出的单个轨迹段组装大图,没有初始估计。此外,在组装段后,自动制度可以执行良好的匹配和姿势图片优化,以在全球范围内平滑合并的地图。我们展示了城市规模合并(120公里)和校园规模重复合并(4.5公里x 8)的汽车。该实验表明,自动化(i)在段检索中超过了第二和第三最佳方法的14%和24%的召回,(ii)在120 km大尺度地图组件(III)中实现了可比较的3D映射精度,IT对于暂时的重新审视是强大的。据我们所知,Automerge是第一种映射方法,它可以在无GPS的帮助下合并数百公里的单个细分市场。
translated by 谷歌翻译
对于长期自治,大多数位置识别方法主要在简化的方案或模拟数据集上进行评估,该数据集无法提供可靠的证据来评估当前同时定位和映射的准备就绪(SLAM)。在本文中,我们提出了一个长期的位置识别数据集,用于在大规模动态环境下用于移动定位。该数据集包括一个校园规模的轨道和城市规模的轨道:1)校园轨道重点关注长期财产,我们在10个轨迹上记录Lidar设备和一个全向相机,并且每个轨迹在变体下重复记录8次照明条件。 2)城市轨道聚焦大型物业,我们将激光雷达设备安装在车辆上,并穿过120公里种类在城市环境中。每个轨迹都提供了两个轨道的地面真实位置,这是从全球位置系统中获得的,具有额外的基于ICP的点云的细化。为了简化评估程序,我们还为Python-API提供了一组地点识别指标,以快速加载我们的数据集并根据不同方法评估识别性能。该数据集的目标是寻找具有高位置识别精度和鲁棒性的方法,并提供长期自治的真正机器人系统。可以从https://github.com/metaslam/alita访问数据集和提供的工具。
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
A "heart attack" or myocardial infarction (MI), occurs when an artery supplying blood to the heart is abruptly occluded. The "gold standard" method for imaging MI is Cardiovascular Magnetic Resonance Imaging (MRI), with intravenously administered gadolinium-based contrast (late gadolinium enhancement). However, no "gold standard" fully automated method for the quantification of MI exists. In this work, we propose an end-to-end fully automatic system (MyI-Net) for the detection and quantification of MI in MRI images. This has the potential to reduce the uncertainty due to the technical variability across labs and inherent problems of the data and labels. Our system consists of four processing stages designed to maintain the flow of information across scales. First, features from raw MRI images are generated using feature extractors built on ResNet and MoblieNet architectures. This is followed by the Atrous Spatial Pyramid Pooling (ASPP) to produce spatial information at different scales to preserve more image context. High-level features from ASPP and initial low-level features are concatenated at the third stage and then passed to the fourth stage where spatial information is recovered via up-sampling to produce final image segmentation output into: i) background, ii) heart muscle, iii) blood and iv) scar areas. New models were compared with state-of-art models and manual quantification. Our models showed favorable performance in global segmentation and scar tissue detection relative to state-of-the-art work, including a four-fold better performance in matching scar pixels to contours produced by clinicians.
translated by 谷歌翻译
Graph neural networks (GNN) have become the default machine learning model for relational datasets, including protein interaction networks, biological neural networks, and scientific collaboration graphs. We use tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. The derived curves are phenomenologically rich: they explain the distinction between learning on homophilic and heterophilic graphs and they predict double descent whose existence in GNNs has been questioned by recent work. Our results are the first to accurately explain the behavior not only of a stylized graph learning model but also of complex GNNs on messy real-world datasets. To wit, we use our analytic insights about homophily and heterophily to improve performance of state-of-the-art graph neural networks on several heterophilic benchmarks by a simple addition of negative self-loop filters.
translated by 谷歌翻译
In this paper, we propose a new neural network architecture based on the H2 matrix. Even though networks with H2-inspired architecture already exist, and our approach is designed to reduce memory costs and improve performance by taking into account the sparsity template of the H2 matrix. In numerical comparison with alternative neural networks, including the known H2-based ones, our architecture showed itself as beneficial in terms of performance, memory, and scalability.
translated by 谷歌翻译
Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects. This work presents DisCoScene: a 3Daware generative model for high-quality and controllable scene synthesis. The key ingredient of our method is a very abstract object-level representation (i.e., 3D bounding boxes without semantic annotation) as the scene layout prior, which is simple to obtain, general to describe various scene contents, and yet informative to disentangle objects and background. Moreover, it serves as an intuitive user control for scene editing. Based on such a prior, the proposed model spatially disentangles the whole scene into object-centric generative radiance fields by learning on only 2D images with the global-local discrimination. Our model obtains the generation fidelity and editing flexibility of individual objects while being able to efficiently compose objects and the background into a complete scene. We demonstrate state-of-the-art performance on many scene datasets, including the challenging Waymo outdoor dataset. Project page: https://snap-research.github.io/discoscene/
translated by 谷歌翻译
Semi-supervised learning (SSL) has made significant strides in the field of remote sensing. Finding a large number of labeled datasets for SSL methods is uncommon, and manually labeling datasets is expensive and time-consuming. Furthermore, accurately identifying remote sensing satellite images is more complicated than it is for conventional images. Class-imbalanced datasets are another prevalent phenomenon, and models trained on these become biased towards the majority classes. This becomes a critical issue with an SSL model's subpar performance. We aim to address the issue of labeling unlabeled data and also solve the model bias problem due to imbalanced datasets while achieving better accuracy. To accomplish this, we create "artificial" labels and train a model to have reasonable accuracy. We iteratively redistribute the classes through resampling using a distribution alignment technique. We use a variety of class imbalanced satellite image datasets: EuroSAT, UCM, and WHU-RS19. On UCM balanced dataset, our method outperforms previous methods MSMatch and FixMatch by 1.21% and 0.6%, respectively. For imbalanced EuroSAT, our method outperforms MSMatch and FixMatch by 1.08% and 1%, respectively. Our approach significantly lessens the requirement for labeled data, consistently outperforms alternative approaches, and resolves the issue of model bias caused by class imbalance in datasets.
translated by 谷歌翻译